Last Update: 2019-02-04 12:57:18

Libraries

Before we start, let’s load a few libraries.

rm(list = ls())

set.seed(100)

options(warn = -1)

library(knitr)
library(ggplot2)
library(caret)
library(doParallel)

registerDoParallel(cores = (detectCores() - 1))

We register all but one core so we can have a lot of parallelsism when we start training our models.

Data Loading

Let’s read in our data.

data.2015 = read.csv("data/2015.csv")
data.2016 = read.csv("data/2016.csv")
data.2017 = read.csv("data/2017.csv")
data.2018 = read.csv("data/2018.csv")

Now, we will only deal with regular season events. So let’s remove the playoffs from our datasets.

get.regular.season = function(data) {
    subset(data, isPlayoffGame == 0)
}

season.2015 = get.regular.season(data.2015)
season.2016 = get.regular.season(data.2016)
season.2017 = get.regular.season(data.2017)
season.2018 = get.regular.season(data.2018)

Now let’s remove extraneous columns. At the end, we will have the following columns (I’ve changed their names for ease):

Old Column Name New Column Name
xCordAdjusted x
yCordAdjusted y
shotAngleAdjusted angle
shotDistance dist
goal goal
get.helpful.data = function(data) {
    data.frame(x = data$xCordAdjusted,
           y = data$yCordAdjusted,
           angle = data$shotAngleAdjusted,
           dist = data$shotDistance,
           team = data$teamCode,
           goal = data$goal)
}

analysis.2015 = get.helpful.data(season.2015)
analysis.2016 = get.helpful.data(season.2016)
analysis.2017 = get.helpful.data(season.2017)
analysis.2018 = get.helpful.data(season.2018)

Sometimes, there is incomplete data. Let’s just keep all the complete cases and remove the incomplete ones.

analysis.2015 = analysis.2015[complete.cases(analysis.2015),]
analysis.2016 = analysis.2016[complete.cases(analysis.2016),]
analysis.2017 = analysis.2017[complete.cases(analysis.2017),]
analysis.all = rbind(analysis.2017, rbind(analysis.2016, analysis.2015))
analysis.all = analysis.all[complete.cases(analysis.all),]
analysis.2018 = analysis.2018[complete.cases(analysis.2018),]

We’ll need a function to get team data.

get.team.data = function(data, code) {
    subset(data, team == code)
}

Creating the Models

With our data, we can start creating models. We’ll be creating the following models:

control = trainControl(method = "repeatedcv", number = 5, repeats = 2)

model.nnet = train(goal ~ . -goal -team,
                   data = analysis.all,
                   method = "nnet",
                   trControl = control)
## # weights:  31
## initial  value 79308.617571 
## iter  10 value 21110.433003
## iter  20 value 19408.328843
## iter  30 value 18788.755236
## iter  40 value 18693.310231
## iter  50 value 18597.284689
## iter  60 value 18567.292177
## iter  70 value 18548.738997
## iter  80 value 18537.793165
## iter  90 value 18523.033168
## iter 100 value 18512.476830
## final  value 18512.476830 
## stopped after 100 iterations
model.knn = train(goal ~ . -goal -team,
                  data = analysis.all,
                  method = "knn",
                  trControl = control)

Extracting Predictions

Our predictions will come from analysis.2018. Here’s what a little bit of that data looks like:

analysis.2018

Now, we can use the predict function to get our predictions.

nnet.prediction = predict(model.nnet, newdata = analysis.2018)
knn.prediction = predict(model.knn, newdata = analysis.2018)

nnet.prediction.data = data.frame(analysis.2018)
nnet.prediction.data$predict = nnet.prediction

knn.prediction.data = data.frame(analysis.2018)
knn.prediction.data$predict = knn.prediction

So, our Neural Network data looks like:

nnet.prediction.data

Our K-Nearest Neighbors data looks like:

knn.prediction.data

Visualizing the Predictions

With our predictions, let’s view how they differ.

plot.nnet = ggplot(nnet.prediction.data) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "orange",
             color = "grey") +
    labs(title = "Predicted Goal Probability from Neural Network Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

plot.knn = ggplot(knn.prediction.data) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "orange",
             color = "grey") +
    labs(title = "Predicted Goal Probability from K-Nearest Neighbors Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

Here is our neural net model:

plot.nnet

Here is our knn model:

plot.knn

Analysis

Pittsburgh Penguins

Let’s first get their data.

pit.nnet = get.team.data(nnet.prediction.data, "PIT")
pit.knn = get.team.data(knn.prediction.data, "PIT")

Now, let’s see how the Penguins fared in our models.

pit.plot.nnet = ggplot(pit.nnet) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#000000",
             color = "#FCB514") +
    labs(title = "Pittsburgh Predicted Goal Probability from NNet Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

pit.plot.knn = ggplot(pit.knn) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#000000",
             color = "#FCB514") +
    labs(title = "Pittsburgh Predicted Goal Probability from KNN Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

Here is the neural network plot:

pit.plot.nnet

Here is the K nearest neighbors plot:

pit.plot.knn

Boston Bruins

Let’s first get their data.

bos.nnet = get.team.data(nnet.prediction.data, "BOS")
bos.knn = get.team.data(knn.prediction.data, "BOS")

Now, let’s see how the Bruins fared in our models.

bos.plot.nnet = ggplot(bos.nnet) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#FFB81C",
             color = "#000000") +
    labs(title = "Boston Predicted Goal Probability from NNet Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

bos.plot.knn = ggplot(bos.knn) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#FFB81C",
             color = "#000000") +
    labs(title = "Boston Predicted Goal Probability from KNN Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

Here is the neural network plot:

bos.plot.nnet

Here is the K nearest neighbors plot:

bos.plot.knn

Tampa Bay Lightning

Let’s first get their data.

tbl.nnet = get.team.data(nnet.prediction.data, "T.B")
tbl.knn = get.team.data(knn.prediction.data, "T.B")

Now, let’s see how the Lightning fared in our models.

tbl.plot.nnet = ggplot(tbl.nnet) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#002868",
             color = "#FFFFFF") +
    labs(title = "Tampa Bay Predicted Goal Probability from NNet Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

tbl.plot.knn = ggplot(tbl.knn) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#002868",
             color = "#FFFFFF") +
    labs(title = "Tampa Bay Predicted Goal Probability from KNN Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

Here is the neural network plot:

tbl.plot.nnet

Here is the K nearest neighbors plot:

tbl.plot.knn

San Jose Sharks

Let’s first get their data.

sjs.nnet = get.team.data(nnet.prediction.data, "S.J")
sjs.knn = get.team.data(knn.prediction.data, "S.J")

Now, let’s see how the Sharks fared in our models.

sjs.plot.nnet = ggplot(sjs.nnet) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#006D75",
             color = "#EA7200") +
    labs(title = "San Jose Predicted Goal Probability from NNet Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

sjs.plot.knn = ggplot(sjs.knn) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#006D75",
             color = "#EA7200") +
    labs(title = "San Jose Predicted Goal Probability from KNN Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

Here is the neural network plot:

sjs.plot.nnet

Here is the K nearest neighbors plot:

sjs.plot.knn

Nashville Predators

Let’s first get their data.

nsh.nnet = get.team.data(nnet.prediction.data, "NSH")
nsh.knn = get.team.data(knn.prediction.data, "NSH")

Now, let’s see how the Predators fared in our models.

nsh.plot.nnet = ggplot(nsh.nnet) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#FFB81C",
             color = "#041E42") +
    labs(title = "Nashville Predicted Goal Probability from NNet Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

nsh.plot.knn = ggplot(nsh.knn) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#FFB81C",
             color = "#041E42") +
    labs(title = "Nashville Predicted Goal Probability from KNN Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

Here is the neural network plot:

nsh.plot.nnet

Here is the K nearest neighbors plot:

nsh.plot.knn

Los Angeles Kings

Let’s first get their data.

lak.nnet = get.team.data(nnet.prediction.data, "L.A")
lak.knn = get.team.data(knn.prediction.data, "L.A")

Now, let’s see how the Kings fared in our models.

lak.plot.nnet = ggplot(lak.nnet) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#111111",
             color = "#A2AAAD") +
    labs(title = "Los Angeles Predicted Goal Probability from NNet Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

lak.plot.knn = ggplot(lak.knn) +
    geom_hex(aes(x = dist, y = predict, alpha = ..count..),
             fill = "#111111",
             color = "#A2AAAD") +
    labs(title = "Los Angeles Predicted Goal Probability from KNN Model",
         x = "Distance from Net",
         y = "Probability of Scoring") +
    theme_minimal()

Here is the neural network plot:

lak.plot.nnet

Here is the K nearest neighbors plot:

lak.plot.knn